The Center of Policing Equity is an organisation that brings together a mixed group of professionals, including research scientists, equity experts, data analysts, and community trainers. Their common goal is to create unbiased and more rational systems for all individuals. If you’d like to learn more about their work, please visit their website at https://policingequity.org.
The dataset we have is a subset of a large collection of data gathered from police departments throughout the United States. It includes various records related to police behavior such as the use of force, stops of vehicles and pedestrians, crime data, and incident details such as date and time. The dataset we have is limited to Dallas, Texas in the year 2016.
library(ggplot2)
library(ggforce)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(scales)
library(RColorBrewer)
library(viridis)
## Loading required package: viridisLite
##
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
##
## viridis_pal
library(leaflet)
library(leaflet.extras)
library(htmlwidgets)
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ tibble 3.1.8 ✔ purrr 0.3.5
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ readr::col_factor() masks scales::col_factor()
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter() masks plotly::filter(), stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(maps)
##
## Attaching package: 'maps'
##
## The following object is masked from 'package:purrr':
##
## map
##
## The following object is masked from 'package:viridis':
##
## unemp
library(gridExtra)
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
library(treemapify)
library(calendR)
## ~~ Package calendR
## Visit https://r-coder.com/ for R tutorials ~~
library(ggpubr)
The data pre-processing stage is mainly focused on understanding the types of columns in the data and the structure of the dataset. The null values in the data are also replaced with suitable string, “Record Missing”. Removing rows or columns with null values may not be appropriate as the null values could indicate a lack of data for a particular category or a completely different type of crime that does not have data for that column. Therefore, it is important to retain the null values, but label them as “not available (data)”.
# Load the data
data <- read.csv("37-00049_UOF-P_2016_prepped.csv")
# Removing first row as it has column names only
data = data[-1,]
attach(data)
head(data)
## INCIDENT_DATE INCIDENT_TIME UOF_NUMBER OFFICER_ID OFFICER_GENDER
## 2 9/3/16 4:14:00 AM 37702 10810 Male
## 3 3/22/16 11:00:00 PM 33413 7706 Male
## 4 5/22/16 1:29:00 PM 34567 11014 Male
## 5 1/10/16 8:55:00 PM 31460 6692 Male
## 6 11/8/16 2:30:00 AM 37879, 37898 9844 Male
## 7 9/11/16 7:20:00 PM 36724 9855 Male
## OFFICER_RACE OFFICER_HIRE_DATE OFFICER_YEARS_ON_FORCE OFFICER_INJURY
## 2 Black 5/7/14 2 No
## 3 White 1/8/99 17 Yes
## 4 Black 5/20/15 1 No
## 5 Black 7/29/91 24 No
## 6 White 10/4/09 7 No
## 7 White 6/10/09 7 No
## OFFICER_INJURY_TYPE OFFICER_HOSPITALIZATION SUBJECT_ID SUBJECT_RACE
## 2 No injuries noted or visible No 46424 Black
## 3 Sprain/Strain Yes 44324 Hispanic
## 4 No injuries noted or visible No 45126 Hispanic
## 5 No injuries noted or visible No 43150 Hispanic
## 6 No injuries noted or visible No 47307 Black
## 7 No injuries noted or visible No 46549 White
## SUBJECT_GENDER SUBJECT_INJURY SUBJECT_INJURY_TYPE
## 2 Female Yes Non-Visible Injury/Pain
## 3 Male No No injuries noted or visible
## 4 Male No No injuries noted or visible
## 5 Male Yes Laceration/Cut
## 6 Male No No injuries noted or visible
## 7 Female No No injuries noted or visible
## SUBJECT_WAS_ARRESTED SUBJECT_DESCRIPTION SUBJECT_OFFENSE
## 2 Yes Mentally unstable APOWW
## 3 Yes Mentally unstable APOWW
## 4 Yes Unknown APOWW
## 5 Yes FD-Unknown if Armed Evading Arrest
## 6 Yes Unknown Other Misdemeanor Arrest
## 7 Yes Unknown Assault/FV
## REPORTING_AREA BEAT SECTOR DIVISION LOCATION_DISTRICT STREET_NUMBER
## 2 2062 134 130 CENTRAL D14 211
## 3 1197 237 230 NORTHEAST D9 7647
## 4 4153 432 430 SOUTHWEST D6 716
## 5 4523 641 640 NORTH CENTRAL D11 5600
## 6 2167 346 340 SOUTHEAST D7 4600
## 7 1134 235 230 NORTHEAST D9 1234
## STREET_NAME STREET_DIRECTION STREET_TYPE
## 2 Ervay N St.
## 3 Ferguson NULL Rd.
## 4 bimebella dr NULL Ln.
## 5 LBJ NULL Frwy.
## 6 Malcolm X S Blvd.
## 7 Peavy NULL Rd.
## LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION LOCATION_CITY LOCATION_STATE
## 2 211 N ERVAY ST Dallas TX
## 3 7647 FERGUSON RD Dallas TX
## 4 716 BIMEBELLA LN Dallas TX
## 5 5600 L B J FWY Dallas TX
## 6 4600 S MALCOLM X BLVD Dallas TX
## 7 1234 PEAVY RD Dallas TX
## LOCATION_LATITUDE LOCATION_LONGITUDE INCIDENT_REASON REASON_FOR_FORCE
## 2 32.782205 -96.797461 Arrest Arrest
## 3 32.798978 -96.717493 Arrest Arrest
## 4 32.73971 -96.92519 Arrest Arrest
## 5 Arrest Arrest
## 6 Arrest Arrest
## 7 32.837527 -96.695566 Arrest Arrest
## TYPE_OF_FORCE_USED1 TYPE_OF_FORCE_USED2 TYPE_OF_FORCE_USED3
## 2 Hand/Arm/Elbow Strike
## 3 Joint Locks
## 4 Take Down - Group
## 5 K-9 Deployment
## 6 Verbal Command Take Down - Arm
## 7 Hand Controlled Escort
## TYPE_OF_FORCE_USED4 TYPE_OF_FORCE_USED5 TYPE_OF_FORCE_USED6
## 2
## 3
## 4
## 5
## 6
## 7
## TYPE_OF_FORCE_USED7 TYPE_OF_FORCE_USED8 TYPE_OF_FORCE_USED9
## 2
## 3
## 4
## 5
## 6
## 7
## TYPE_OF_FORCE_USED10 NUMBER_EC_CYCLES FORCE_EFFECTIVE
## 2 NULL Yes
## 3 NULL Yes
## 4 NULL Yes
## 5 NULL Yes
## 6 NULL No, Yes
## 7 NULL Yes
summary(data)
## INCIDENT_DATE INCIDENT_TIME UOF_NUMBER OFFICER_ID
## Length:2383 Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## OFFICER_GENDER OFFICER_RACE OFFICER_HIRE_DATE
## Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## OFFICER_YEARS_ON_FORCE OFFICER_INJURY OFFICER_INJURY_TYPE
## Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## OFFICER_HOSPITALIZATION SUBJECT_ID SUBJECT_RACE
## Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## SUBJECT_GENDER SUBJECT_INJURY SUBJECT_INJURY_TYPE SUBJECT_WAS_ARRESTED
## Length:2383 Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## SUBJECT_DESCRIPTION SUBJECT_OFFENSE REPORTING_AREA BEAT
## Length:2383 Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## SECTOR DIVISION LOCATION_DISTRICT STREET_NUMBER
## Length:2383 Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## STREET_NAME STREET_DIRECTION STREET_TYPE
## Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION LOCATION_CITY
## Length:2383 Length:2383
## Class :character Class :character
## Mode :character Mode :character
## LOCATION_STATE LOCATION_LATITUDE LOCATION_LONGITUDE INCIDENT_REASON
## Length:2383 Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## REASON_FOR_FORCE TYPE_OF_FORCE_USED1 TYPE_OF_FORCE_USED2 TYPE_OF_FORCE_USED3
## Length:2383 Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## TYPE_OF_FORCE_USED4 TYPE_OF_FORCE_USED5 TYPE_OF_FORCE_USED6
## Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## TYPE_OF_FORCE_USED7 TYPE_OF_FORCE_USED8 TYPE_OF_FORCE_USED9
## Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## TYPE_OF_FORCE_USED10 NUMBER_EC_CYCLES FORCE_EFFECTIVE
## Length:2383 Length:2383 Length:2383
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
# Dataset structure
str(data)
## 'data.frame': 2383 obs. of 47 variables:
## $ INCIDENT_DATE : chr "9/3/16" "3/22/16" "5/22/16" "1/10/16" ...
## $ INCIDENT_TIME : chr "4:14:00 AM" "11:00:00 PM" "1:29:00 PM" "8:55:00 PM" ...
## $ UOF_NUMBER : chr "37702" "33413" "34567" "31460" ...
## $ OFFICER_ID : chr "10810" "7706" "11014" "6692" ...
## $ OFFICER_GENDER : chr "Male" "Male" "Male" "Male" ...
## $ OFFICER_RACE : chr "Black" "White" "Black" "Black" ...
## $ OFFICER_HIRE_DATE : chr "5/7/14" "1/8/99" "5/20/15" "7/29/91" ...
## $ OFFICER_YEARS_ON_FORCE : chr "2" "17" "1" "24" ...
## $ OFFICER_INJURY : chr "No" "Yes" "No" "No" ...
## $ OFFICER_INJURY_TYPE : chr "No injuries noted or visible" "Sprain/Strain" "No injuries noted or visible" "No injuries noted or visible" ...
## $ OFFICER_HOSPITALIZATION : chr "No" "Yes" "No" "No" ...
## $ SUBJECT_ID : chr "46424" "44324" "45126" "43150" ...
## $ SUBJECT_RACE : chr "Black" "Hispanic" "Hispanic" "Hispanic" ...
## $ SUBJECT_GENDER : chr "Female" "Male" "Male" "Male" ...
## $ SUBJECT_INJURY : chr "Yes" "No" "No" "Yes" ...
## $ SUBJECT_INJURY_TYPE : chr "Non-Visible Injury/Pain" "No injuries noted or visible" "No injuries noted or visible" "Laceration/Cut" ...
## $ SUBJECT_WAS_ARRESTED : chr "Yes" "Yes" "Yes" "Yes" ...
## $ SUBJECT_DESCRIPTION : chr "Mentally unstable" "Mentally unstable" "Unknown" "FD-Unknown if Armed" ...
## $ SUBJECT_OFFENSE : chr "APOWW" "APOWW" "APOWW" "Evading Arrest" ...
## $ REPORTING_AREA : chr "2062" "1197" "4153" "4523" ...
## $ BEAT : chr "134" "237" "432" "641" ...
## $ SECTOR : chr "130" "230" "430" "640" ...
## $ DIVISION : chr "CENTRAL" "NORTHEAST" "SOUTHWEST" "NORTH CENTRAL" ...
## $ LOCATION_DISTRICT : chr "D14" "D9" "D6" "D11" ...
## $ STREET_NUMBER : chr "211" "7647" "716" "5600" ...
## $ STREET_NAME : chr "Ervay" "Ferguson" "bimebella dr" "LBJ" ...
## $ STREET_DIRECTION : chr "N" "NULL" "NULL" "NULL" ...
## $ STREET_TYPE : chr "St." "Rd." "Ln." "Frwy." ...
## $ LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION: chr "211 N ERVAY ST" "7647 FERGUSON RD" "716 BIMEBELLA LN" "5600 L B J FWY" ...
## $ LOCATION_CITY : chr "Dallas" "Dallas" "Dallas" "Dallas" ...
## $ LOCATION_STATE : chr "TX" "TX" "TX" "TX" ...
## $ LOCATION_LATITUDE : chr "32.782205" "32.798978" "32.73971" "" ...
## $ LOCATION_LONGITUDE : chr "-96.797461" "-96.717493" "-96.92519" "" ...
## $ INCIDENT_REASON : chr "Arrest" "Arrest" "Arrest" "Arrest" ...
## $ REASON_FOR_FORCE : chr "Arrest" "Arrest" "Arrest" "Arrest" ...
## $ TYPE_OF_FORCE_USED1 : chr "Hand/Arm/Elbow Strike" "Joint Locks" "Take Down - Group" "K-9 Deployment" ...
## $ TYPE_OF_FORCE_USED2 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED3 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED4 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED5 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED6 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED7 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED8 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED9 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED10 : chr "" "" "" "" ...
## $ NUMBER_EC_CYCLES : chr "NULL" "NULL" "NULL" "NULL" ...
## $ FORCE_EFFECTIVE : chr " Yes" " Yes" " Yes" " Yes" ...
#checking data for missing values
data <- data %>% dplyr::mutate_all(~ifelse(.=="NULL", "Records Missing", .))
data <- data %>% dplyr::mutate_all(~ifelse(.=="Unknown", "Records Missing", .))
This data pertains to policing equity in Dallas, Texas in 2016. It consists of 2383 rows and spans across 47 columns. My goal is to analyze the dataset and investigate whether there is a racial connection to the crimes that occurred in Dallas. Specifically, I will examine whether the criminals belong to the same race or different races. To help you understand the dataset better, I have created a table that highlights the main entities and their corresponding features.
| Object | Features in Dataset |
|---|---|
| Incident | Date, Time, Reason |
| Officer | Id, Gender, Race, Injury, Injury type, Years on force etc. |
| Subject | Id, Gender, Race, Injury, Injury type, Arrested, Offense etc. |
| Street | Number, Name, Direction |
| Location | Reporting area, Division, District, Latitude, Longitude |
# Visualisation 1: Crime reported based on different race of the individuals
# Define a color gradient based on the height of the bars
changing_colors <- colorRamp(c("#FF9933", "#F44339"))
data %>%
filter(!is.na(SUBJECT_RACE)) %>%
group_by(SUBJECT_RACE) %>%
summarise(Count = n()) %>%
mutate(TotalCount = nrow(data)) %>%
mutate(Percentage = (Count/TotalCount) * 100) %>%
arrange(desc(Count)) %>%
ungroup() %>%
mutate(SUBJECT_RACE = reorder(SUBJECT_RACE, Count)) %>%
plot_ly(x = ~SUBJECT_RACE, y = ~Percentage, type = 'bar',
name = "Criminals Reported By Race",
marker = list(color = ~Percentage, colorscale = 'changing_colors')) %>%
layout(xaxis = list(title = "Criminal Race", tickangle = 0, showticklabels = TRUE),
yaxis = list(title = "Percentage", tickformat = ".2f"),
margin = list(l = 60, r = 10, b = 60, t = 30),
title = list(text = "Percentage of Reported Criminals Based on Race", x = 0.5, y = 0.95, font = list(size = 20)))
The plot clearly indicates that individuals of the Black race are reported more frequently than any other race for criminal activity, accounting for more than 50% of the data. The other two race groups reported for crime are Hispanic and White, in that order. The remaining subject races tend to have very little reported crime data.
#Visualisation 2: Count of reported subjects by gender
# count number of subjects by gender
subject_gender_counts <- data %>%
filter(!is.na(SUBJECT_GENDER)) %>%
group_by(SUBJECT_GENDER) %>%
summarise(count = n())
pie_chart <- plot_ly(
labels = subject_gender_counts$SUBJECT_GENDER,
values = subject_gender_counts$count,
type = "pie",
textposition = "inside",
textinfo = "label+percent",
insidetextorientation = "horizontal",
marker = list(
colors = c("#1f77b4", "#ff7f0e", "#2ca02c"),
line = list(color = "#FFFFFF", width = 1)
)
) %>%
layout(
title = list(
text = "Count of Reported Subjects by Gender",
font = list(size = 20)
),
margin = list(l = 60, r = 60, b = 60, t = 60),
legend = list(
x = 0.5,
y = -0.2,
bgcolor = "white",
bordercolor = "black",
borderwidth = 1,
orientation = "h"
)
)
pie_chart
#Visualisation 3: Genders reported for crimes in each subject race
# Create a summary of subject race and arrest status counts
subject_race_arrested <- data %>%
filter(!is.na(SUBJECT_RACE), !is.na(SUBJECT_GENDER)) %>%
group_by(SUBJECT_RACE, SUBJECT_GENDER) %>%
summarise(count = n())
## `summarise()` has grouped output by 'SUBJECT_RACE'. You can override using the
## `.groups` argument.
bar_chart <- plot_ly(subject_race_arrested, x = ~SUBJECT_RACE, y = ~count, color = ~SUBJECT_GENDER, type = "bar") %>%
layout(
title = list(
text = "Count of Arrests by Subject Gender and Race",
font = list(size = 20)
),
xaxis = list(title = "Subject Race", tickangle = 0, gap = 0.3),
yaxis = list(title = "Count"),
showlegend = TRUE,
bargap = 0.2
)
bar_chart
The pie chart clearly illustrates the distribution of reported subjects for criminal activity by gender, with women accounting for only 18.5% and men dominating with a significant majority of 81.1%. Furthermore, there is a barplot illustrating the distribution of male and female criminals across different races. Once again, the black male and female populations are found to be the most dominant among the different races. Clearly showing that both male and female are subject to more criminal activities in their respective gender category as well.
#Visualisation 4: Count of Subject Arrested by Race
# Create a summary of subject race and arrest status counts
subject_race_arrested <- data %>%
filter(!is.na(SUBJECT_RACE), !is.na(SUBJECT_WAS_ARRESTED)) %>%
group_by(SUBJECT_RACE, SUBJECT_WAS_ARRESTED) %>%
summarise(count = n())
## `summarise()` has grouped output by 'SUBJECT_RACE'. You can override using the
## `.groups` argument.
# Create a bar chart
bar_chart <- plot_ly(subject_race_arrested, x = ~SUBJECT_RACE, y = ~count, color = ~SUBJECT_WAS_ARRESTED, type = "bar") %>%
layout(
title = list(
text = "Count of Arrests by Subject Race",
font = list(size = 20)
),
xaxis = list(title = "Subject Race"),
yaxis = list(title = "Count"),
showlegend = TRUE
)
bar_chart
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
#Visualisation 5: Focus on major reported criminal races
incidents_race <- data[SUBJECT_RACE %in% c('Black', 'Hispanic', 'White'),]
indicents <- ggplot(incidents_race, aes(x = INCIDENT_REASON, fill = SUBJECT_RACE)) +
geom_bar() +
labs(title = "Actions Taken Against Subjects by Race", x = "Action Types", y= "Total Count with Racial Facet") +
coord_flip() +
theme_minimal() +
theme(legend.position = "bottom") +
scale_fill_manual(values = c("#1b9e77", "#d95f02", "#7570b3")) # custom color palette
ggplotly(indicents)
As previously mentioned, individuals from a Black racial background are more likely to be reported for a crime. This observation is reinforced by the barplot that shows the count of arrests, which indicates that Black subjects are more likely to be arrested for a reported incident when compared to individuals from other reported races such as Hispanics and Whites.
Total count with racial facet” refers to a way of categorising or presenting data where the total number of occurrences (count) is broken down by racial or ethnic groups (facets). The horizontal barplot suggests the actions taken against the subjects of different race. This plot shows that the arrests and service calls are the two primary actions by the police of response in Dallas. And in both the responses for action taken, it primarily taken against the Black followed by Hispanic and White.
#Visualisation 6: Arrests count by different reasons
# Group the data by INCIDENT_REASON and count the number of occurrences
# Count the number of occurrences of each unique value in the INCIDENT_REASON column
incident_count <- data %>%
count(INCIDENT_REASON)
# Define a color palette using RColorBrewer
colors <- brewer.pal(n = nrow(incident_count), name = "Set2")
## Warning in brewer.pal(n = nrow(incident_count), name = "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
pie_chart <- plot_ly(incident_count, labels = ~INCIDENT_REASON, values = ~n, type = "pie",
marker = list(colors = "colors")) %>%
layout(title = "Arrests Count by Incident Reasons")
pie_chart
The plot presents a comprehensive visualisation of the primary methods employed for reporting a crime. The data indicates that the majority of incidents are reported through arrests, accounting for 48.6% of the total. Service calls are the second most common means of reporting, accounting for 28.2% of incidents. This information is critical for law enforcement to make informed decisions about resource allocation and crime prevention strategies.
# Visualisation 7: Incidents by Subject Description for Black, Hispanic and White Subjects
# Filter data to only include Hispanic, Black and White SUBJECT_RACE
filtered_data <- data %>%
filter(SUBJECT_RACE %in% c("Hispanic", "Black", "White"))
# Group data by SUBJECT_RACE and SUBJECT_DESCRIPTION
grouped_data <- filtered_data %>%
group_by(SUBJECT_RACE, SUBJECT_DESCRIPTION) %>%
summarize(count = n())
## `summarise()` has grouped output by 'SUBJECT_RACE'. You can override using the
## `.groups` argument.
# Create a bar plot using ggplot2 with facet_wrap
facet_plot <- ggplot(grouped_data, aes(x = SUBJECT_DESCRIPTION, y = count, fill = SUBJECT_DESCRIPTION)) +
geom_bar(stat = "identity", position = "stack") +
labs(x = "Subject Description", y = "Count", fill = "Subject Description",
title = "Incidents by Subject Description for Hispanic, Black, and White Subjects") +
theme_bw() +
facet_wrap(~SUBJECT_RACE, ncol = 1) +
theme(axis.text.x = element_blank())
plotly_facet_plot <- ggplotly(facet_plot)
plotly_facet_plot
The illustration provides insights into the primary descriptions of subjects during their arrests. It is evident that individuals from the Black community are more likely to be reported under the description of influence of unknown drugs and mental instability. Conversely, individuals from the Hispanic and White communities are more likely to be suspected of being under the influence of alcohol.
#Visualisation 8: Count of Officers by Gender
officer_gender_counts <- data %>%
filter(!is.na(OFFICER_GENDER)) %>%
group_by(OFFICER_GENDER) %>%
summarise(count = n())
# create pie chart
pie_chart <- plot_ly(
labels = officer_gender_counts$OFFICER_GENDER,
values = officer_gender_counts$count,
type = "pie",
textposition = "inside",
textinfo = "label+percent",
insidetextorientation = "horizontal",
marker = list(
colors = c("#1f77b4", "#ff7f0e", "#2ca02c"),
line = list(color = "#FFFFFF", width = 1)
)
) %>%
layout(
title = list(
text = "Count of Officers by Gender",
font = list(size = 20)
),
margin = list(l = 60, r = 60, b = 60, t = 60),
legend = list(
x = 0.5,
y = -0.2,
bgcolor = "white",
bordercolor = "black",
borderwidth = 1,
orientation = "h"
)
)
pie_chart
# Visualisation 9: Officers by race
# Count the number of occurrences of each unique value in the OFFICER RACE column
officer_race_counts <- data %>% count(OFFICER_RACE)
# Define a color palette using RColorBrewer
colors <- brewer.pal(n = nrow(officer_race_counts), name = "Set2")
pie_chart <- plot_ly(officer_race_counts, labels = ~OFFICER_RACE, values = ~n, type = "pie",
marker = list(colors = colors)) %>%
layout(title = list(text = "Distribution of Officer Races"))
pie_chart
#Visualisation 10: # Group data by OFFICER_RACE and calculate average tenure
# Group the data by officer race and calculate the average years on force
duration_on_force <- data %>%
group_by(OFFICER_RACE) %>%
summarise(Average_Years_On_Force = mean(as.numeric(OFFICER_YEARS_ON_FORCE)))
pie_chart <- plot_ly(duration_on_force,
labels = ~OFFICER_RACE,
values = ~Average_Years_On_Force,
type = 'pie',
hole = 0.4,
textposition = "inside",
texttemplate = "%{value:.2f}") %>%
layout(title = "Average Years on Force by Officer Race",
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
pie_chart <- pie_chart %>%
add_annotations(x = 0.5, y = -0.15,
text = "",
showarrow = FALSE,
font = list(size = 14)) %>%
layout(legend = list(orientation = "h",
y = -0.25, x = 0.5,
font = list(size = 12)),
font = list(size = 12))
pie_chart
#Visualisation 11: Crime reported by police officers for different types of subject race
data_count <- data %>%
filter(!is.na(OFFICER_RACE) & !is.na(SUBJECT_RACE)) %>%
group_by(OFFICER_RACE, SUBJECT_RACE) %>%
summarise(Count = n())
## `summarise()` has grouped output by 'OFFICER_RACE'. You can override using the
## `.groups` argument.
plot <- ggplot(data_count, aes(x = OFFICER_RACE, y = Count, fill = SUBJECT_RACE)) +
geom_bar(stat = "identity") +
labs(title = "Reported Crime Grouped By Subject Race", x = "Officer Race", y = "Count") +
theme_bw()
# convert ggplot to plotly object and add hover text
plotly_obj <- ggplotly(plot, tooltip = c("OFFICER_RACE", "SUBJECT_RACE", "Count"))
plotly_obj <- plotly_obj %>%
layout(
legend = list(title = "Subject Race"),
xaxis = list(title = "Officer Race"),
yaxis = list(title = "Count"),
margin = list(l = 60, r = 10, b = 60, t = 30),
title = list(text = "Reported Crime Grouped By Subject Race", x = 0.5, y = 0.95, font = list(size = 20))
)
plotly_obj
It is worth noting that the majority of police officers are male, with a representation of 89.9%. Among the different racial groups, White officers represent the highest proportion at 61.7%, followed by Hispanic officers at 20.2%, and Black officers at 14.3%. It is important to acknowledge that while White officers report the highest number of crimes across all communities, it is not necessarily indicative of racial profiling. However, further analysis and investigation are required to determine the underlying factors that contribute to these patterns and to ensure that law enforcement is conducted in a fair and unbiased manner. In terms of the handling of cases, white officers are more likely to handle cases involving black subjects. This can be because of the more majority of white officers in the police force in comparison to other officer races.
#Visualisation 12: Subject Race that caused police officers hospitalisation
hosp_plot <- ggplotly(
ggplot(data, aes(x = OFFICER_HOSPITALIZATION, y = SUBJECT_RACE)) +
geom_point(alpha = 0.5, color = "steelblue") +
facet_grid(~ OFFICER_RACE) +
labs(
title = "Pair Plot for Officers Hospitalised in Action",
x = "Officer Hospitalisation",
y = "Subject Race"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 12, face = "bold"),
strip.text = element_text(size = 12, face = "bold")
)
)
hosp_plot
# Visualisation 13: Count of incidents in year 2016
# Creating an empty data frame with all dates in 2016
dates <- data.frame(Date = seq.Date(from = as.Date("2016-01-01"), to = as.Date("2016-12-31"), by = "days"))
# Counting all the incidents for each date in the dataset
incident_counts <- data %>%
mutate(Date = mdy(INCIDENT_DATE)) %>%
group_by(Date) %>%
summarize(Freq = n())
incidents_by_day <- left_join(dates, incident_counts, by = "Date")
incidents_by_day$Freq[is.na(incidents_by_day$Freq)] <- 0
plot_ly(
data = incidents_by_day,
x = ~Date,
y = ~weekdays(Date),
z = ~Freq,
type = "scatter",
mode = "markers",
marker = list(
symbol = "circle",
sizemode = "diameter",
sizeref = 0.1,
sizemin = 3,
color = ~Freq,
colorscale = list(c(0, max(incidents_by_day$Freq)), c("#FCFFDD", "#05A972")),
showscale = TRUE,
colorbar = list(title = "Number of Incidents")
)
) %>%
layout(
title = "Police Equiting 2016",
xaxis = list(title = NULL),
yaxis = list(title = NULL)
)
## Warning: 'scatter' objects don't have these attributes: 'z'
## Valid attributes include:
## 'cliponaxis', 'connectgaps', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'fill', 'fillcolor', 'fillpattern', 'groupnorm', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hoveron', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'legendgroup', 'legendgrouptitle', 'legendrank', 'line', 'marker', 'meta', 'metasrc', 'mode', 'name', 'opacity', 'orientation', 'selected', 'selectedpoints', 'showlegend', 'stackgaps', 'stackgroup', 'stream', 'text', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
# Visualisation 14: Year, Month and Hourly Incident Rates
data$INCIDENT_DATE <- as.Date(data$INCIDENT_DATE, format = "%m/%d/%Y")
data$INCIDENT_DATE <- gsub("00","20",data$INCIDENT_DATE)
data$INCIDENT_DATE <- as.Date(data$INCIDENT_DATE, format = "%Y-%m-%d")
data$INCIDENT_TIME <- format(strptime(data$INCIDENT_TIME, "%I:%M:%S %p"), "%H:%M:%S")
data$INCIDENT_MONTH <- months(as.Date(data$INCIDENT_DATE))
data$INC_MONTH <-format(data$INCIDENT_DATE,"%m")
data$INCIDENT_HOUR <- as.numeric(substr(data$INCIDENT_TIME, 0, 2))
data$INCIDENT_DAY <- wday(data$INCIDENT_DATE, label=TRUE)
data$INC_HOUR <- substr(data$INCIDENT_TIME, 0, 2)
data$INC_DATE <- substr(data$INCIDENT_DATE, 9, 10)
data_year <- data %>%
group_by(INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_DAY) %>%
summarize(count = n())
## `summarise()` has grouped output by 'INCIDENT_DATE', 'INCIDENT_MONTH'. You can
## override using the `.groups` argument.
data_month <- data %>%
group_by(INC_MONTH) %>%
summarize(count = n())
data_day <- data %>%
group_by(INCIDENT_DAY,INCIDENT_HOUR) %>%
summarize(count = n())
## `summarise()` has grouped output by 'INCIDENT_DAY'. You can override using the
## `.groups` argument.
data$INC_HOUR <- substr(data$INCIDENT_TIME, 0, 2)
data %>% group_by(INC_HOUR) %>%
summarize(avg =n()) -> data_hour_n
plot1 <- ggplot(data = data_year, aes(INCIDENT_DATE, count)) + geom_line(size=0.5, col="gray") +
geom_smooth(method = "loess", color = "purple", span = 1/5) + theme_bw() + labs(x="Months ", y= "INCIDENT COUNTS", title="Year vs Incident Counts")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plot2 <- ggplot(data_month, aes(x=INC_MONTH, y =count, group=1)) + geom_line() + geom_line( size = 1,colour ="black") + labs(x="Months of 2016", y= "INCIDENT COUNTS", title="Months vs Incident Counts") + theme_bw()
plot3 <- ggplot(data_hour_n, aes(x = INC_HOUR, y = avg, group = "count")) + geom_line( size = 1, colour = "magenta") + labs(x="HOURS IN A DAY", y= "INCIDENT COUNTS", title="Hours vs Incident Counts")+ theme_bw() +
theme(axis.text.x=element_text(angle=-90, vjust=0.5)) +
labs(x = "Hour of Day", y = "count") + theme_bw()
plot4 <- ggplot(data_year, aes(count)) +
geom_density(alpha = 0.5, colour = "black", fill ="blue")+ labs(x="Incident counts", y= "Density", title="Distribution of Incident Counts") + theme_bw()
grid.arrange(plot1, plot2, plot3, plot4 , nrow=2)
## `geom_smooth()` using formula = 'y ~ x'
# Incidents by Month
ggplot(data = data, aes(x = INCIDENT_HOUR, fill = INCIDENT_MONTH, group = INCIDENT_MONTH)) +
geom_density(alpha = 0.5, adjust = 2) +
scale_fill_viridis_d(alpha = 0.2, begin = 0.2, end = 0.9, option = "turbo", direction = -1) +
labs(title = "Density Plot of Incidents by Month and Hour",
x = "Hour of the Incident",
y = "Density",
fill = "Month of Incident")
## Warning: Removed 10 rows containing non-finite values (`stat_density()`).
#Visualisation 15: Incident tracking through the year
# Group the data by date and count the number of incidents
crimes_rate_yrly <- data %>%
group_by(INCIDENT_DATE) %>%
summarise(Total_Crimes = n())
crimes_rate_yrly$INCIDENT_DATE <- as.Date(crimes_rate_yrly$INCIDENT_DATE, format = "%m/%d/%Y")
names(crimes_rate_yrly)[names(crimes_rate_yrly) == "INCIDENT_DATE"] <- "Date_Happened"
line_chart <- ggplot(crimes_rate_yrly, aes(x = Date_Happened, y = Total_Crimes)) +
geom_line(color = "red") +
geom_point(color = "red") +
labs(x = "Date", y = "Incident Count", title = "Incidents Tracking Through the Year") +
theme_minimal()
plot <- ggplotly(line_chart, tooltip = c("Total_Crimes"))
plot
#Visualisation 16: Incidents Counts Across Days and Months
# Group the data by day and month and count the number of incidents
df_monthday <- data %>%
group_by(INCIDENT_DAY, INC_MONTH) %>%
summarize(count = n())
## `summarise()` has grouped output by 'INCIDENT_DAY'. You can override using the
## `.groups` argument.
df_monthday <- df_monthday[complete.cases(df_monthday), ]
ggplot(df_monthday, aes(x= INCIDENT_DAY, y=INC_MONTH,fill = count)) + geom_tile( ) +
geom_text(aes(INCIDENT_DAY, INC_MONTH, label = count), color = "black", size = 4) + scale_y_discrete("Months",labels=c("January","February", "March", "April","May", "June","July","August", "September","October","November","December")) + labs(x="Days of Month", y= "Months", title=" Incident Counts across Days and Months")+
scale_fill_gradientn(colours = c("#3794bf", "#FFFFFF", "#df8640"))
Overall, there was a decrease in the number of incidents reported throughout the year, with February and March having the highest numbers and December the lowest. During the day, there were two peaks of incidents reported: one between 5 pm and 8 pm, and another smaller peak at 2 am. The lowest incidents occurred between 4 am and 10 am. The distribution of incident counts across the year shows a right skewness, with most days having fewer than 24 incidents reported. However, there were two peaks of incidents reported in several months, with normal distributions observed in March, June, July, January, August, and April, and left skewness observed in September. Incident counts also vary across different days of the week, with Sundays displaying a normal distribution of incidents. The average range of incidents reported per day and month falls between 3 to 20. These patterns and trends provide valuable insights for law enforcement agencies to allocate resources effectively and improve their response to incidents.
#Visualisation 17: Each type of force used for each subject race
data <- data %>% rename( F1 = TYPE_OF_FORCE_USED1, F2 = TYPE_OF_FORCE_USED2, F3 = TYPE_OF_FORCE_USED3, F4 = TYPE_OF_FORCE_USED4, F5 = TYPE_OF_FORCE_USED5, F6 = TYPE_OF_FORCE_USED6, F7 = TYPE_OF_FORCE_USED7, F8 = TYPE_OF_FORCE_USED8, F9 = TYPE_OF_FORCE_USED9, F10 = TYPE_OF_FORCE_USED10)
df_grouped <- data %>%
filter(SUBJECT_RACE %in% c("White", "Black", "Hispanic")) %>%
group_by(SUBJECT_RACE, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10 ) %>%
summarise(count = n()) %>%
pivot_longer(cols = F1:F10, names_to = "Force_Type", values_to = "Count") %>%
filter(Count > 0)
## `summarise()` has grouped output by 'SUBJECT_RACE', 'F1', 'F2', 'F3', 'F4',
## 'F5', 'F6', 'F7', 'F8', 'F9'. You can override using the `.groups` argument.
plot <- ggplot(df_grouped, aes(x = Force_Type, y = Count, fill = SUBJECT_RACE)) +
geom_col(position = "dodge") +
facet_wrap(~ SUBJECT_RACE, scales = "free_x") +
labs(title = "Each Type of Force Used for Each Subject Race", x = "Type of Force Used", y = "Count", fill = "Race") +
theme(legend.position = "bottom")
ggplotly(plot, tooltip = c("Force_Type", "Count", "SUBJECT_RACE"))
This plot shows the count of each type of force used for different subject races. The x-axis represents the type of force used, while the y-axis represents the count of that type of force. The bars are colored according to the subject race, and the plot is split into different facets for each race. It clearly shows that all the major subject race are prone to different measures of the use of force for arrests.
# Visualisation 18: Group the data by officer race and use of force
df_grouped <- data %>%
group_by(OFFICER_RACE, REASON_FOR_FORCE) %>%
summarise(count = n())
## `summarise()` has grouped output by 'OFFICER_RACE'. You can override using the
## `.groups` argument.
# created stacked bar chart
ggplot(df_grouped, aes(x = OFFICER_RACE, y = count, fill = REASON_FOR_FORCE)) +
geom_bar(stat = "identity") +
labs(x = "Officer Race", y = "Count", fill = "Reason for Force") +
theme_minimal()
# Contingency table to see how many incidents happened in which division of Dallas and by which race
table(data$DIVISION, data$SUBJECT_RACE)
##
## American Ind Asian Black Hispanic Other Records Missing White
## CENTRAL 0 4 302 93 3 14 147
## NORTH CENTRAL 0 0 117 67 3 5 127
## NORTHEAST 0 1 223 60 0 1 56
## NORTHWEST 0 0 54 76 5 2 54
## SOUTH CENTRAL 0 0 269 25 0 2 14
## SOUTHEAST 0 0 233 76 0 7 46
## SOUTHWEST 1 0 135 127 0 8 26
This table presents a summary of the distribution of individuals across different racial/ethnic groups in seven different regions. The data suggests that the highest count of Black individuals is in the Central region, where there are 302 recorded instances. Similarly, the South Central region has the highest count of Hispanic individuals with 25 records, while the Southwest region has the highest count of White individuals with 26 records. Interestingly, the Northeast region has the highest count of Asian individuals with only one more individual compared to the Central region. Conversely, only one American Indian individual is recorded in the Southwest region. The “Other” racial/ethnic category has a total count of 18 individuals across all regions. Moreover, the data indicates that the highest number of missing records is in the Central region with 14.
#Visualisation 19: Incidents by Division and Month
# Summarize the data
division_counts <- data %>%
group_by(INC_MONTH, DIVISION) %>%
summarize(count = n())
## `summarise()` has grouped output by 'INC_MONTH'. You can override using the
## `.groups` argument.
# Plot the data
ggplot(division_counts, aes(x = DIVISION, y = INC_MONTH, fill = count)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "red") +
geom_text(aes(label = count), color = "black", size = 3) +
labs(x = "Division", y = "Month", title = "Incidents by Division and Month")
# summarize the data by month and division
df_div <- data %>%
group_by(INC_MONTH, DIVISION) %>%
summarize(count = n())
## `summarise()` has grouped output by 'INC_MONTH'. You can override using the
## `.groups` argument.
# Incidents by division and month
ggplot(df_div, aes(x = INC_MONTH, y = count, group = DIVISION, color = DIVISION)) +
geom_line(size = 1.2) +
facet_wrap(~ DIVISION, ncol = 4) +
scale_color_brewer(palette = "Set1") +
theme(legend.position = "bottom") +
labs(x = "Month", y = "Incident Counts",
title = "Incidents by Division and Month",
subtitle = "Trends of incident counts by division over time",
color = "Division")
#Incidents in different divisions overtime
division_incidents <- as.data.frame(table(DIVISION, INCIDENT_DATE))
division_incidents$INCIDENT_DATE <- as.Date(division_incidents$INCIDENT_DATE, format = "%m/%d/%Y")
names(division_incidents)[names(division_incidents) == "INCIDENT_DATE"] <- "Date_Happened"
names(division_incidents)[names(division_incidents) == "Freq"] <- "Total_Crimes"
division_incidents <- ggplot(division_incidents, aes(x = Date_Happened, y = Total_Crimes)) +
geom_line(color = "black") +
facet_wrap(~DIVISION,ncol=2) +
theme(legend.position="none") +
labs(x="", y= "", title="Incidents frequency in different divisions over time")
ggplotly(division_incidents, tooltip = c("Total_Crimes"))
#Visualisation 20:
division_incidents_count <- data.frame(table(LOCATION_LONGITUDE, LOCATION_LATITUDE, DIVISION)) %>% filter(Freq != 0)
names(division_incidents_count)[names(division_incidents_count) == "Freq"] <- "Total_Crimes"
division_incidents_count <- division_incidents_count[-1,]
library(leaflet)
library(leaflet.extras)
leaflet() %>%
addProviderTiles("Stamen.TonerLite") %>%
addCircleMarkers(data=division_incidents_count,
lat = as.numeric(LOCATION_LATITUDE),
lng = as.numeric(LOCATION_LONGITUDE),
radius = ~Total_Crimes/100,
fillColor = 'darkBlue',
fillOpacity = 0.6,
stroke = FALSE,
label = ~Total_Crimes,
labelOptions = labelOptions(
noHide = TRUE,
direction = 'auto')
) %>%
addHeatmap(
data = division_incidents_count,
lng = as.numeric(LOCATION_LONGITUDE),
lat = as.numeric(LOCATION_LATITUDE),
radius = 20,
blur = 15
) %>%
addLegend(
position = 'bottomright',
pal = colorNumeric(palette = 'Blues', domain = division_incidents_count$Total_Crimes),
values = division_incidents_count$Total_Crimes/100,
title = 'Total Crimes',
opacity = 0.6
)
## Warning in pal(c(r[1], cuts, r[2])): Some values were outside the color scale
## and will be treated as NA
## Warning in validateCoords(lng, lat, funcName): Data contains 55 rows with either
## missing or invalid lat/lon values and will be ignored
## Warning in validateCoords(lng, lat, funcName): Data contains 55 rows with either
## missing or invalid lat/lon values and will be ignored
# Visualisation 21: Officers race by division
# summarize the data by division and officer race
df_race_division <- data %>%
group_by(DIVISION, OFFICER_RACE) %>%
summarize(count = n())
## `summarise()` has grouped output by 'DIVISION'. You can override using the
## `.groups` argument.
# create the pie chart
ggplot(df_race_division, aes(x = "", y = count, fill = OFFICER_RACE)) +
geom_bar(stat = "identity", width = 1) +
facet_wrap(~ DIVISION, ncol = 3) +
labs(x = "", y = "",
title = "Officer Race by Division",
fill = "Officer Race") +
coord_polar(theta = "y") +
theme_void() +
theme(legend.position = "bottom")
According to the data, the Central Division has reported the highest number of crimes with more than 40 incidents. In contrast, the Northwest Division has the lowest incident rates and the number of incidents has been decreasing over time. The South Central, Southeast, and Southwest Divisions have also shown a sharp decline in incidents over time. Meanwhile, the Northeast and North Central Divisions have shown a neutral trend. Towards the end of the year, almost all other divisions exhibit a decrease in incidents, and December has the lowest number of incidents in all Divisions except for Northeast. The Northeast Division has a constant and steady intermittent trend.
With the officer distribution in the different locations, it can be made certain that the assumption for racial profiling cannot be justified. There is need for more data-points to ensure that it can be one of the problems for Black community being reported for crime more than any other community in Dallas. The most evident thing that can be ensured from the above analysis is that there is an immediate need for localising crime and making policies to curb the crimes committed by the different communities at different locations.